Incorporating durational modification in voice transformation
نویسندگان
چکیده
Voice transformation is the process of using a small amount of speech data from a target speaker to build a transformation model that can be used to generate arbitrary speech that sounds like the target speaker. One common current technique is building Gausian Mixture Models to map spectral aspects from source to target speakers. This paper proposes the use of duration models to improve the transformation models and output speech quality. Testing across seven target speakers shows a statistically significant improvement in a popular objective metric when duration modification is performed both during training and testing of a Gaussian Mixture Model mapping based voice transformation system.
منابع مشابه
Impact of durational outlier removal from unit selection catalogs
Outlier removal is a straightforward technique for improving the quality of unit selection catalogs without hand correction. This paper investigates the use of phone durations as a criteria for removing bad units. Scoring conditioned on linguistic context demonstrably better than statistics based on phone class alone. The impact of voice modification is evaluated with a 444K utterance test corpus.
متن کاملProsodic Cues for Hesitation
In our efforts to model spontaneous speech for use in, for example, spoken dialogue systems, a series of experiments have been conducted in order to investigate correlates to perceived hesitation. Previous work has shown that it is the total duration increase that is the valid cue rather than the contribution by either of the two factors pause duration and final lengthening. In the present expe...
متن کاملF0 transformation within the voice conversion framework
In this paper, several experiments on F0 transformation within the voice conversion framework are presented. The conversion system is based on a probabilistic transformation of line spectral frequencies and residual prediction. Three probabilistic methods of instantaneous F0 transformation are described and compared. Moreover, a new modification of inter-speaker residual prediction is proposed ...
متن کاملVoice Impersonation using Generative Adversarial Networks
Voice impersonation is not the same as voice transformation, although the latter is an essential element of it. In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker. In this paper, we propose a novel...
متن کاملDurational Correlates of Word-initial Voiceless Geminate Stops: The Case of Kelantan Malay
This paper investigates the production of wordinitial geminate consonants in Kelantan Malay with a focus on voiceless stops. It presents an acoustic phonetic analysis examining two acoustic parameters: closure duration and voice onset time (VOT). Evidence from a production experiment indicates that there is a clear durational contrast between word-initial voiceless geminate stops and their sing...
متن کامل